Discovery of vaccine-like recombinant SARS-CoV-2 circulating in human

For viral diseases, vaccination with live attenuated vaccine (LAV) is one of the most effective means for fighting the diseases. However, LAV occasionally overflows from vaccinated individuals circulate in the population with unforeseen consequences. Currently, SARS-CoV-2 LAVs are undergoing clinical trials. In this study, we found that the viruses isolated from Indian SARS CoV-2 infected persons may be candidate LAV-derived strains, indicating the risk of SARS-CoV-2 LAV spillover from vaccinated persons, increasing the complexity of SARS-CoV-2 detection. In addition, the property of frequent recombination of SARS-CoV-2 increases the chance of LAV virulence reversion. Therefore, how to distinguish the LAV viruses from the wild strain and how to avoid the recombination of the circulating vaccine strain and the wild strain are the challenges currently faced by SARS CoV-2 LAV development.


Introduction
Since the outbreak at the end of 2019, the novel coronavirus pneumonia  has continued to rage around the world, seriously endangering human health and life. According to the World Health Organization (WHO), as of May 30, 2022, more than 520 million people have suffered from the disease, with over 6.28 million deaths (https:// covid 19. who. int/). Moreover, these numbers keep growing. Its pathogen is acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which belongs to the genus Betacoronavirus of the family Coronaviridae [3,14]. The genome of SARS-CoV-2 is positive sense single-stranded RNA up to 30,000 nucleotides, encoding a complex system of structural and non-structural proteins [30], such as spike protein (S), envelope protein (E), nucleocapsid protein (N), and RNA polymerase [1]. Of them, the S protein, exposed on the surface of the virion, is responsible for binding the receptor protein, angiotensin-converting enzyme 2 (ACE2), and mediates the entry of the virus into the host cell after being cleaved into S1 and S2 by the Furin enzyme of the host cell [23]. In addition, S protein is also the protective antigen of SARS-CoV-2, which can induce the host to produce neutralizing antibodies and terminate its replication of in the host [5].
Vaccination remains the primary means of controlling COVID-19 although different measures are being developed to prevent and treat the pandemic. So far, a large number of vaccines have been used in clinical or are being developed, including inactivated and live attenuated vaccines, recombinant protein vaccines or recombinant subunit vaccines, nucleoid vaccines, and viral vector vaccines. Li and his colleagues reviewed the research progress of these vaccines [15]. An ideal SARS-CoV-2 vaccine should: elicit strong humoral and cellular immune responses; have equipment that is easy to store and transport; and be affordable for all countries, especially low-and middle-income countries. Although various SARS-CoV-2 vaccines have been extensively studied and used [21], rapid antigenic drift and epitope loss greatly compromises their effectiveness [25]. SARS-CoV-2 live attenuated vaccines (LAV) still have their unique advantages: activate all types of host immune responses (cellular, humoral and innate); present all epitopes to the host immune cells, thereby inducing a broad host immune response and avoiding the immune escape caused by antigenic drift as much as possible; and have relatively low storage, transportation and immune costs [4,6,19]. These advantages make LAV especially suitable for less developed regions and countries. Therefore, vaccine research institutions in different countries, including India and the United States (US), are adopting various strategies to construct SARS-CoV-2 LAV [21]. In July 2021, a group from the US reported the efficacy and safety of a candidate SARS-CoV-2 LAV constructed through a deoptimization strategy [28]. In the spike (S) protein gene of this strain, besides the deletion of the segment encoding for the furin cleavage site, 283 point mutations were introduced. Although these artificial mutations did not change the amino acid sequence, the variant is highly attenuated. Its antigen epitope is a perfect match to that of the circulating wild-type (WT) strain, providing the capacity for a broad immune response and making the vaccine more likely to retain efficacy. In Vero E6 cells, COVI-VAC is temperature sensitive and has high replication titer. After COVI-VAC vaccination, Syrian golden hamsters did not show significant pathological changes. In vitro, the sera of immunized hamsters could neutralize the WT virus. In vivo, when hamsters were challenged with the WT virus, COVI-VAC vaccination reduced viral titers in the lung, rendered the virus undetectable in the brain, and protected hamsters from almost all virus-associated weight loss. Moreover, a single intranasal dose could provide enough protection for the inoculated animals. These advantages endow COVI-VAC with the promise of mass vaccination. Groups from many countries have also reported their progress in the research of LAV [16,24,26].
As a pathogen that is prone to homologous recombination between viruses [12,13,27], SARS-CoV-2 LAV virus spilled from the vaccinated population into the environment will inevitably recombine with circulating strains, resulting in new circulating strains with unpredictable consequences. Therefore, knowing whether LAV spill over into the environment from vaccinated individuals is an essential for the safety assessment of LAV. So far, there have been no reports in this regard. India, one of the largest developing countries, is conducting research on LAV [21]. Therefore, in this study, we analyzed the genome sequences of SARS-CoV-2 isolated from infected persons in India before July 2021 in the SARS-CoV-2 databases, to explore the possibility of spillover of LAV so as to provide references for the study of LAV.

Virus sequences
From the SARS-CoV-2 database of GenBank or GISAID, we collected the genome sequences of 1643 SARS-CoV-2 isolated from infected individuals in India before August 2021. With the help of the MUSCLE program in the MEGA X software package [11], sequence alignments were performed on these sequences, and the optimized alignment results were finally obtained for subsequent analysis.

Recombination analysis
In order to determine whether there is some viruses undergoing genetic recombination in their genome, the recombination analysis software RDP 3.0 [20] was used to analyze the above processed data set to preliminarily screen recombinant sequences. And then, the Sim-Plot program [18] was used to visualize the genomic sequence similarity between the putative recombinant and their potential parental virus so as to further determine the reliability of the recombination signal.

Phylogenetic analysis
To determine the phylogeny of viruses with recombinant signals, we downloaded the LAV strain and other reference viruses of different genotypes from the SARS-CoV-2 database (Table 1) and analyzed their phylogenetic history. Before the phylogenetic reconstruction, the nucleotide substitution model selection tool MODELS in the phylogenetic analysis software package MEGA X [11] was used to find the optimal substitution model, and then, the maximum likelihood method was used to reconstruct phylogenetic history employing the optimal substitution model. The robustness of the most recent common ancestor of each phylogenetic branch was determined by the bootstrap method of 1000 replications, and the bootstrap value > 70% was regarded as robustness.

Results and discussion
Analysis of more than 1600 isolates from India revealed two isolates with significant recombination signals. In the detection results of RDP, five methods, GENE-CONV, MaxChi, Chimaera, SiScan, 3Seq gave significant recombination positive signals (p < 0.01) ( Table 2), and it was inferred that the recombination region was located within the S gene.
The two viruses with the recombination signal were isolated from two infected peoples on June 30, 2020. After removing the ambiguous bases, their genome sequences had 99.99% similarity, with only four bases different in the S gene (Fig. 1A). Comparing their genome sequences with the wild strains Wuhan-Hu-1 and HKU-SZ-005b isolated early in the virus outbreak, the two Indian isolates were almost identical (> 99.9%) to the two reference virus sequences except for the S gene. However, in the local region of the S gene, their similarity was less than 90% (Fig. 1B). This highly variable region is located in the S2 coding region (Fig. 1B). According to the evolution rate of SARS-CoV-2, the   annual substitution rate of each site of the S gene is about 5.7 × 10 − 4 [8]. Therefore, if the variation is caused by natural mutation, the S gene of these two Indian isolates might differ from other SARS-CoV-2 isolates by up to 3-4 bases at most. It means that the parent virus that can provide the S2 region for these two Indian isolates may not exist in nature. Therefore, the putative recombination regions on the genomes of the two Indian isolates should not originate from the recombination between SARS-CoV-2 circulating in nature, but are more likely to be the product of genetic engineering.
The substituted codon signature also indicated that the S2 region of these two Indian isolates was the product of artificial editing. We analyzed the substitution sites in S2 region of the isolate 5844 and found that after removal of ambiguous bases, there were nucleotide substitutions in codons of approximately 90 amino acids compared to the earliest SARS-CoV-2 isolate, Wuhan-Hu-1. Interestingly, although substitutions also appeared in the first position of very few codons, almost all substitutions occurred in the third position of these codons. Interestingly, all these substitutions took place between synonymous codons and did not change any amino acids of the S protein (Fig. 2). This regular substitution rule is significantly different from the natural mutation in S gene of SARS-CoV-2, and is more in line with LAV constructed by genetic recombination after artificial editing of the S2 region.
Based on the above analysis, the two Indian SARS-CoV-2s are likely to be resulted from the spillover of LAV, rather than the product of natural recombination between circulating viruses. To test this hypothesis, using the S2 region of the isolate 5844 as the query sequence, we searched the SARS-CoV-2 database in Gen-Bank to find the virus with the highest genomic similarity to them. It was found that the genomic sequence of the candidate LAV COVI-VAC, which was undergoing phase I clinical trials, had up to 99.6% similarity of the query virus. Until June 2022, with the exception of the vaccine strain COVI-VAC, we have not found any wild circulating strains that are more than 95% similar to the two India isolates in this region (Fig. 3A). This also suggested that the orthologous S gene of them was unlikely to have arisen through natural evolution of SARS-CoV-2. Further comparing the whole genomic sequence of the isolate 5844 with that of COVI-VAC, we found that their differences were in the S2 region, with a substitution of the total of 21 bases, while other regions had almost no changes. It was also noticed that, unlike Isolate_5844, the Furin enzyme cleavage site of COVI-VAC was missing (Fig. 3B). These results indicated that the two Indian strains might not be directly derived from COVI-VAC.
To demonstrate that these two Indian SARS-CoV-2 isolates may be live attenuated vaccine-derived strains, we reconstructed their phylogenetic histories. Phylogenetically, regardless of the S2 region of the genome or other regions, these two Indian isolates and the candidate LAV COVI-VAC formed a monophyletic group (Fig. 4), supporting that they should be spillover vaccine strains.
Although we cannot determine their real parents yet, the above results showed that the two SARS-CoV-2 isolates from India might be derived from the LAV candidate strains. Moreover, their S gene is most likely the product of genetic engineering after codon deoptimization. Fortunately, apart from these two Indian viruses, we have not found any more circulating viruses homologous to them in the SARS-CoV-2 databases so far, suggesting that these LAV-derived viruses have not spread widely among the population.
Before July 2021, LAVs were still at stages of laboratory research or phase I clinical trials [21]. The two Indian isolates were collected in June 2020, indicating that they may be the viruses spilled out during animal or clinical trials of LAVs, or resulted from outflow of laboratories. This finding suggested that there was a risk of spillover of LAVs into the environment, and therefore, may have some unpredictable consequences for SARS-CoV-2 control. The immediate impact will be to complicate SARS-CoV-2 surveillance. According to WHO recommendations, a positive real-time PCR result of viral nucleic acid test is the gold standard for determining whether someone is infected by SARS-CoV-2. However, if peoples are infected by the spilled LAV, they will also be test positive of nucleic acid, making it difficult to determine whether they are patients infected by wild SARS-CoV-2. Therefore, how to distinguish vaccine strains circulating in the environment from wild strains is one of the challenges faced by SARS-CoV-2 LAV development. In this sense, the construction of LAVs with gene deletion may be a good option to solve this problem.
Another issue posed by the spillover of LAVs is how to avoid reversion of the virulence of vaccine strains circulating in the environment. Theoretically, due to the use of multiple point mutations during the construction of LAVs, it is unlikely that SARS-CoV-2 will be resulted in virulence reversion through gene mutation. However, homologous recombination among viruses is the intrinsic genetic mechanism by which SARS-CoV-2 evolves rapidly [2,17,22,27]. If homologous recombination occurs between wild viruses and vaccine strains circulating in the environment, there will be some unpredictable consequences. One lesson comes from the WHO Global Polio Eradication Program. Through the coverage of large-scale oral poliovirus vaccine, the global control of poliomyelitis has achieved good results, and almost completed the WHO goal of eradicating wild poliovirus [7,29]. Unfortunately, over the course of several years in the early 2000s, Africa saw several outbreaks of polio associated with attenuated vaccination [10]. After in-depth research, it was found that these outbreaks were caused by the reversion of vaccine virulence because of the recombination of the spillover LAV with enteroviruses [9], which seriously interferes with the polio eradication plan. Therefore, how to avoid vaccine virus spillover and recombine with wild coronaviruses is also an issue that must be considered in the development of SARS-CoV-2 LAVs.
In conclusion, this study found that there might be some LAV-like strains among the SARS-CoV-2 strains circulating in the Indian population. In the Fig. 3 Sequence comparison between the Indian isolate 5844 and its homologous viruses.A Result of BLAST analysis performed in GenBank using the S2 region of the Indian isolate 5844 as the query (BLAST was performed on June 10, 2022). Sequence similarity between the isolate 5844 and other viruses was indicated by a red box. B Genome sequence comparison of the Indian isolate 5844 with the live attenuated vaccine strain COVI-VAC. The vertical axis is the sequence similarity between viruses, the horizontal axis is the position of the virus genome, and the query sequence used for comparison is the isolate 5844 phylogenetic trees inferred from different regions of the genome, they fall into the LAV lineage, and thus may be result from spillover of LAV. This finding suggests the risk of loss of live attenuated vaccines from vaccinated individuals into the environment, thereby increasing the complexity of SARS-CoV-2 control. In addition, recombination of attenuated vaccines with wild viruses may also have unforeseen consequences. Therefore, how to avoid recombination between vaccines circulating in the environment and wild strains is an important challenge during the research of LAVs.